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Abstract 

When  there  is  more  than  one  perspeetive  to  interpret  a  dataset, 
eoneordanee  (or  diseordanee)  between  the  result  sets  from  the 
different  perspeetives  plays  an  important  role  in  getting  refined 
results.  For  example,  several  elustering  algorithms  generate 
different  results  for  the  same  input.  To  get  useful  insights,  users 
need  to  eombine  different  perspeetives  by  eheeking  eoneordanee 
between  those  results.  In  this  paper,  we  present  an  interaetive 
visualization  tool  ealled  ConSet,  where  users  ean  effeetively 
examine  multiple  sets  at  onee.  It  uses  permutation  matrix 
visualization  to  enable  users  to  easily  identify  similar  sets.  In 
addition  to  a  standard  Venn  diagram,  we  introduee  a  Fairy 
diagram  that  allows  users  to  eompare  two  or  three  sets  without 
ineonsisteneies.  We  eondueted  a  qualitative  user  study  to  evaluate 
how  our  tool  works  in  eomparison  with  a  traditional  set 
visualization  tool  based  on  a  Venn  diagram.  We  found  that  users 
performed  better  with  ConSet  than  with  the  traditional  interfaee 
for  many  tasks  and  most  users  preferred  ConSet. 

CR  Categories  and  Subjeet  Deseriptors:  1.6.9. e  Information 
visualization,  H.5.2  User  Interfaees,  H.5.2.f  Graphieal  user 
interfaees,  H.1.2.a  Human  faetors,  H.2.8.e  Data  and  knowledge 
visualization,  H.2.8.h  Interaetive  data  exploration  and  diseovery 

Additional  Keywords:  set  eoneordanee,  Venn  diagram.  Fairy 
diagram,  Treemap,  permutation  matrix,  eluster  eomparison,  gene 
ontology 

1  Introduction 

When  there  is  more  than  one  way  to  approaeh  a  problem,  it  ean 
be  useful  to  eombine  multiple  perspeetives.  Visualization  of  the 
eoneordanee  or  diseordanee  of  those  perspeetives  ean  help 
integrate  important  knowledge.  Seientifie  problem  solving 
usually  involves  eoneordanee  analysis  among  several  perspeetives. 
This  ineludes  problems  in  information  retrieval,  bioinformaties, 
data  mining,  and  so  on.  For  example,  Google  and  MSN  seareh 
often  return  different  seareh  results.  Users  eould  have  a  more 
judieious  view  on  the  seareh  term  by  eomparing  those  results. 
Suppose  seientists  run  an  experiment  and  there  are  several  semi¬ 
standard  methods  to  aequire  numerieal  values  from  a 
measurement  deviee.  The  ehoiee  of  a  data  aequisition  method  ean 
profoundly  ehange  the  resulting  data  interpretation.  Without 
eheeking  the  eoneordanee  of  different  aequisition  methods, 
seientists  might  have  high  false  positive  rates.  For  example. 
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moleeular  biologists  have  to  use  a  “probe  set  signal  algorithm”  to 
aequire  signal  values  of  genes  from  Affymetrix  GeneChips.  They 
get  different  sets  of  suffieiently  powered  genes  in  the  subsequent 
power  analysis  depending  on  the  signal  algorithm  used.  The 
eoneordanee  of  the  results  sets  from  different  signal  algorithms 
ean  be  eheeked  using  set  operations  in  eonneetion  with  various 
eoneordanee  measures.  This  enables  biologists  to  identify 
eoneordant/diseordant  genes.  Therefore,  they  ean  sometimes 
signifieantly  reduee  the  false  positive  rates  by  simply  eheeking  the 
eoneordanee  of  the  results  of  different  algorithms. 

Similar  problems  oeeur  afterwards.  If  the  biologists  deeide  to 
use  elustering  algorithms  to  identify  important  patterns  in  the 
aequired  dataset,  they  have  to  deeide  what  kind  of  elustering 
algorithms  to  use.  Resorting  to  only  one  elustering  algorithm 
eould  bias  the  result  sinee  different  algorithms  might  eome  up 
with  eompletely  different  patterns  depending  on  how  the 
algorithm  deteets  elusters.  Sinee  most  elustering  algorithms 
generate  disjoint  sets(=elusters),  there  are  no  similar  sets  in  the 
result  of  one  elustering  algorithm.  If  we  eombine  all  elusters  from 
two  elustering  algorithms,  eoneordanee  between  those  algorithms 
ean  be  eheeked  by  seeing  how  many  sets  are  similar  to  eaeh  other. 

Another  example  would  be  when  one  data  element  ean  be 
elassified  into  multiple  eategories.  For  example,  a  gene  produet 
ean  be  related  to  many  gene  ontology  terms,  and  a  web  resouree 
ean  be  mapped  to  multiple  eategories  in  the  Open  Direetory 
(www.dmoz.org).  Identifying  the  elements  elassified  into 
different  eategories  helps  users  unveil  the  unknown  features  of  the 
element  and  of  the  dataset  eontaining  that  element. 

In  existing  information  visualization  tools,  brushing  and  linking 
teehniques  [4]  were  used  to  show  some  eoneordanee. 
Coordinated  highlighting  of  many  views  for  the  same  dataset  ean 
reveal  interseetion  of  sets.  For  example,  hierarehieal  elustering 
results  eomparison  using  paired  dendrogram  views  or  phylogenie 
trees  eomparison  using  paired  tree  views  ean  be  thought  of  as 
showing  eoneordanee  of  two  perspeetives,  or  a  group  of  terminal 
nodes.  Graph  visualization  ean  also  be  a  eandidate  sinee  we  ean 
represent  eaeh  set  as  a  node  and  the  relationship  (similarity)  of 
sets  as  links.  While  graph  drawing  teehniques  eombined  with 
elustering  approaeh  ean  show  an  overview  of  relationships,  sueh 
as  similarities/dissimilarities  among  sets,  it  is  not  easy  to 
ineorporate  intuitive  ways  to  support  important  set  operations. 

We  thought  that  a  more  general  set  visualization  tool  was 
neeessary  to  support  important  tasks  for  eoneordanee  analysis  of 
sets:  (1)  to  show  an  overview  of  relationships  between  sets,  (2)  to 
aggregate  and  filter  sets/elements  aeeording  to  users’  interests,  (3) 
to  effieiently  perform  fundamental  set  operations  sueh  as 
interseetion  and  differenee,  and  (4)  to  generate  deeper  insight  into 
the  original  problem  from  the  eoneordanee  visualization. 

In  this  paper,  we  present  intuitive  interfaees  and  interaetions  for 
set  eoneordanee  analysis  built  upon  existing  visualization 
teehniques  sueh  as  permutation  matrix  [6].  The  information 
visualization  mantra  (overview  first,  zoom  and  filter,  detail  on 
demand)  is  the  underlying  guideline  of  our  ConSet  (Figure  1) 
design. 


Report  Documentation  Page 

Form  Approved 

0MB  No.  0704-0188 

Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 

VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  0MB  control  number. 

1.  REPORT  DATE 

2QQg  2.  REPORT  TYPE 

3.  DATES  COVERED 

00-00-2006  to  00-00-2006 

4.  TITLE  AND  SUBTITLE 

Visualizing  Concordance  of  Sets 

5a.  CONTRACT  NUMBER 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

6.  AUTHOR(S) 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

human-Computer  Interaction  Lab, Department  of  Computer 

Science, University  of  Maryland,College  Park,MD,20742 

8.  PEREORMING  ORGANIZATION 

REPORT  NUMBER 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

10.  SPONSOR/MONITOR’S  ACRONYM(S) 

11.  SPONSOR/MONITOR’S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

The  original  document  contains  color  images. 

14.  ABSTRACT 

15.  SUBJECT  TERMS 

16.  SECURITY  CLASSIEICATION  OE:  17.  LIMITATION  OE 

APl9!TR  apt 

18.  NUMBER  19a.  NAME  OE 

rtin  papihQ  pthQp/tmqtpt  u  PUPCirtM 

a.  REPORT  b.  ABSTRACT  c.  THIS  PAGE 

unclassified  unclassified  unclassified 

8 

Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


Figure  1.  ConSet  with  16  sets  and  31  elements.  The  Permutation  Matrix  view  shows  an  overview  of  the  relationships  among  sets  and 
elements.  The  Dynamie  Control  view  on  the  right  enables  users  to  filter  sets  and  elements.  It  also  allows  users  to  seleet  two  or  three  sets  to 
show  a  diagram.  The  Diagram  Ordering  view  at  the  bottom  shows  the  top  10  diagrams  of  two  or  three  sets. 


Users  ean  see  an  overview  of  relationships  among  sets  and 
elements  in  a  permutation  matrix.  Aggregation  of  elements  with 
the  same  membership  and  filtering  by  dynamie  query  deviees 
enable  users  to  narrow  down  to  a  handful  of  important  sets  and 
elements.  Users  ean  also  see  the  relationships  between  two  or 
among  three  sets  in  the  eonventional  Venn  diagram  or  our  novel 
Fairy  diagram.  We  also  ran  a  qualitative  user  study  in  eomparison 
with  VennMaster  (shown  in  Figure  2),  a  set  visualization  tool  that 
uses  a  generalized  Venn  diagram.  The  user  study  suggested  that 
ConSet  supports  more  tasks  with  less  errors  eompared  to 
VennMaster.  The  user  study  also  enabled  us  to  identify  several 
ways  to  improve  the  interfaee  and  design  of  ConSet. 

2  Related  Work 

Brushing  and  linking,  powerful  information  visualization 
teehniques,  ean  be  used  to  reveal  eoneordanees  between  sets. 
Coordinated  multiple  views  provide  users  with  ways  to 
understand  relationships  between  datasets  behind  the  views  [3]. 
HCE  shows  two  dendrograms  at  onee,  highlights  the 
eorresponding  terminal  nodes  in  the  two  dendrograms,  and  shows 
the  mapping  with  eonneeting  lines  when  users  eliek  on  a  braneh 
of  a  dendrogram  [13].  TreeJuxtaposer  [12]  also  applies  the 
brushing  and  linking  teehniques  as  well  as  Foeus+Context 
teehniques  [8]  to  eompare  two  large  phylogenie  trees  with 
guaranteed  visibility.  Users  ean  see  the  diseordanee  of  the  two 
hierarehieal  struetures  by  examining  the  highlighting  and/or 
eonneetions.  Sometimes,  the  main  purpose  of  seleeting  an 
internal  node  on  a  tree  visualization  is  to  seleet  a  set  of  terminal 
nodes  reaehable  from  the  internal  node.  This  problem  ean  be 


generalized  as  a  set  visualization  and  the  main  task  ean  be 
eoneordanee  eheeking  among  sets. 


File  ^ions  Help 


(  Data  If  Selection  Y  Inconsistencies  ~|f  Categories  ] 


?0  import  settings: 

Minimum  total  =  30 
lax  p-value  =  0.05 
iemory=  50244  kB 

Jach_GO-Categotylist.se :  Bach_GO-Genelist.gce] 
elements :  31 


Figure  2.  VennMaster  with  the  same  dataset  as  in  Figure  1.  We 
manually  plaeed  labels  of  some  sets  using  VennMaster. 


MetaCrystal  [17]  is  a  visualization  tool  based  on  the  InfoCrystal 
layout  [16]  that  helps  users  fuse  together  seareh  results  from 
different  seareh  engines.  It  utilizes  various  visual  features  sueh  as 
shape,  size,  eolor,  proximity,  and  orientation  to  show  the  degree 
of  overlap  among  different  seareh  results.  Overlapping  seareh 
results  are  expeeted  to  provide  more  eomprehensive,  relevant,  and 
effeetive  view  on  the  subjeets  delivered  by  the  seareh  terms.  Here 
again,  users’  tasks  performed  with  MetaCrystal  ean  also  be 
thought  of  as  eoneordanee  eheeking  among  sets  (i.e.  seareh  results 
from  different  seareh  engines). 

When  it  eomes  to  set  visualization,  Venn  diagrams  are  the  de 
faeto  standard.  A  Venn  diagram  is  a  speeial  ease  of  an  Euler 
diagram.  Venn  diagrams  should  have  areas  to  represent  all 
possible  eombinations  of  sets  regardless  of  whether  that  area  is 
aetually  empty  or  not.  This  restrietion  is  loosened  in  Euler 
diagrams,  where  empty  areas  do  not  have  to  appear.  These 
diagrams  are  applied  to  various  problems  in  bioinformaties, 
information  retrieval,  and  information  visualization.  New 
applieations  sometimes  require  some  additional  restrietions  on 
how  to  draw  Euler  diagrams  sueh  as  the  one  that  the  shape  of 
eontour  should  be  a  eirele  and  more  information  sueh  as 
eardinality  is  eoded  as  size  (area)  and/or  eolor  of  a  eontour.  It  is 
important  to  mention  that  the  terms  Venn  diagram  and  Euler 
diagram  are  often  eonfused.  Euler  diagrams,  where  eaeh  eontour 
is  a  eirele,  are  often  ealled  Venn  diagrams,  even  though 
theoretieally  this  is  not  eorreet.  In  this  paper,  we  follow  this 
general  pereeption  of  Euler  diagram  and  use  the  term  Venn 
diagram  for  the  Euler  diagram  with  the  eonstraint. 

VennMaster  is  to  our  knowledge  the  only  visualization  tool  that 
shows  an  arbitrary  number  of  sets  in  Venn  diagrams,  where  eaeh 
set  is  represented  as  a  polygon  with  a  user-defined  number  of 
edges  [10].  When  there  are  enough  edges,  eaeh  set  appears  almost 
like  a  eirele.  The  size  of  eaeh  polygon  is  proportional  to  the 
eardinality  of  the  eorresponding  set.  All  properly  size-eoded 
polygons  are  plaeed  in  sueh  a  way  that  the  size  of  eaeh 
interseetion  area  is  also  proportional  to  the  number  of  elements  in 
the  interseetion.  Sinee  the  optimal  size  eoding  and  layout 
determination  are  too  expensive  to  be  solved  in  a  pure  analytieal 
way,  they  resort  to  genetie  algorithm  teehniques. 

VennMaster  was  developed  to  improve  users’  interpretation  and 
visualization  of  the  output  of  a  famous  bioinformaties  tool,  or 
GoMiner.  GoMiner  enables  researehers  to  query  the  gene 
ontology  database  (www.geneontology.org;  eomprehensive 
annotation  of  genes  or  gene  produets)  for  assoeiated  eategories  in 
a  eellular  eontext  [19].  Sinee  one  gene  ean  be  assoeiated  with 
more  than  one  gene  ontology  eategory,  the  interpretation  of  sueh 
eomplex  assoeiations  is  a  ehallenging  task.  VennMaster 
translated  this  problem  into  a  set  relationship  visualization 
problem  (i.e.,  treating  a  gene  ontology  eategory  as  a  set  and  a 
gene  produet  as  an  element).  Sinee  the  approaeh  was  very  useful, 
VennMaster  was  integrated  into  GoMiner. 

While  it  is  useful  to  have  one  more  visualization  approaeh 
adopted  to  a  well-known  bioinformaties  tool,  this  approaeh  still 
has  a  lot  of  drawbaeks  from  an  information  visualization 
perspeetive.  First  of  all,  there  are  three  kinds  of  ineonsisteneies  in 
the  VennMaster  visualization:  (1)  it  is  not  guaranteed  that  all 
possible  interseetions  are  visible  in  the  generalized  Venn  diagram 
display,  so  those  so-ealled  ineonsistent  interseetions  are  shown  in 
a  separate  list  view,  (2)  sinee  it  uses  regular  eonvex  polygons, 
there  will  be  interseetions  of  polygons  where  no  element  is 
mapped,  whieh  will  be  explained  later  in  the  next  seetion,  and  (3) 
the  resulting  layout  of  diagrams  ean  be  different  in  eaeh  run  of  the 
program  beeause  it  uses  a  genetie  algorithm  to  optimize  the  layout. 

A  matrix-based  representation  was  often  used  to  show 
relationships  between  items  by  using  both  rows  and  eolumns  to 
represent  items  and  values  in  eaeh  eell  to  show  the  relationship. 


For  example,  Abello  and  Kom  presented  matrix  and  eolor  map 
based  teehniques  to  visualize  phone  ealls  made  between  states  [1]. 
Van  Ham  used  multilevel  eall  matriees  in  the  management  of 
large  software  projeets  [18].  Kineaid  applied  an  extended 
permutation  matrix  to  the  task  of  exploratory  data  analysis  of 
multi-experiment  mieroarray  studies  [11].  Ghoniem  et  al.  used 
adjaeeney  matriees  to  interaetively  visualize  and  explore  relations 
between  eonstraints  and  variables  in  eonstraint  problems  [9]. 

We  thought  that  information  visualization  teehniques  eould 
improve  users’  experienee  in  interpreting  sueh  eomplex  set 
relationships.  It  ean  be  aeeomplished  without  the  overburden  of 
drawing  a  lot  of  eireles  in  proper  seale  and  loeation.  Furthermore, 
we  ean  maintain  the  familiarity  of  simple  diagrams  sueh  as  Venn 
diagrams.  We  applied  the  permutation  matrix  display  to  set 
eoneordanee  visualization  to  provide  a  better  overview  of  set 
eoneordanee  without  ineonsisteneies  mentioned  above. 
Interaetive  seleetion  and  filtering  methods  enable  users  to  narrow 
down  to  a  handful  number  of  sets.  The  detail  is  shown  as  a 
general  Venn  diagram  or  our  new  Fairy  diagram  after  users  seleet 
two  or  three  sets. 

3  Visualizing  Concordance  of  Sets 


3.1  Untangling  Overlaps 

While  signifieant  overlaps  of  many  sets  in  the  general  Venn 
diagram  visualization  tool  elearly  shows  high  similarities  of  sets, 
those  overlaps  make  it  diffieult  to  see  the  details  on  memberships 
of  elements  to  sets.  In  addition,  non-overlapped  areas  are  hard  to 
seleet  when  overlaps  eover  most  of  the  elements.  We  thought  that 
a  permutation  matrix,  a  proven  multidimensional  visual  strueture, 
eould  help  untangle  overlaps  while  earrying  similarity  information. 
For  the  set  eoneordanee  visualization,  eaeh  eolumn  represents  an 
element  and  eaeh  row  represents  a  set  (Figure  3).  If  an  element  Cj 
belongs  to  a  set  5/,  we  fill  the  eell  C{iJ)  with  gray,  otherwise  C(/, 
j)  is  empty.  Eaeh  set  is  given  a  distinetive  eolor  and  the  set  name 
is  displayed  at  the  end  of  its  eorresponding  row  in  its  own  eolor. 
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Figure  3.  Permutation  matrix  view  for  set  eoneordanee 
visualization  shows  the  eoneordanee  of  three  power  analysis 
results  by  three  probe  set  signal  algorithms  with  7643  genes. 
Aggregation  drastieally  redueed  the  number  of  eolumns  from 
7643  to  7.  The  degree  of  aggregation  is  shown  as  histograms  in 
log  seale. 

We  show  information  regarding  elements  at  top  three  rows, 
whieh  we  eall  “eolumn  header.”  The  eolumn  header  ineludes, 
from  top  to  bottom.  Element  Name,  Membership,  and  Degree  of 
Aggregation,  eaeh  in  a  separate  row.  The  membership  row  shows 
pie  ehart-like  glyphs,  where  eaeh  pie  represents  a  set  to  whieh  the 
eorresponding  element  belongs  and  is  filled  with  the  eolor  of  the 
set.  With  the  eolor-eoded  membership  information,  users  ean 
easily  grasp  how  many  sets  an  element  belongs  to. 

Sinee  all  elements  are  visible  unlike  Venn  diagram 
visualizations,  it  is  neeessary  to  implement  a  method  to 
aeeommodate  a  large  number  of  eolumns.  It  is  reasonable  to 


assume  that  many  elements  would  share  the  same  membership, 
when  the  number  of  elements  is  signifieantly  larger  than  the 
number  of  sets.  Thus,  by  aggregating  those  elements  into  a  single 
eolumn,  it  is  possible  not  only  to  save  a  signifieant  amount  of 
sereen  spaee  but  also  to  have  a  elear  overview  in  a  eompaet  form. 
When  several  elements  are  aggregated  to  a  single  eolumn,  only 
the  representative  element  is  shown  in  the  permutation  matrix,  and 
other  aggregated  elements  are  hidden.  The  name  of  the  eolumn  is 
the  representative  element  that  eomes  first  in  alphabetieal  order. 
The  number  of  aggregated  elements  is  given  in  parentheses  at  the 
end  of  the  representative  element’s  name.  In  addition,  the  number 
of  aggregated  elements  is  visualized  as  a  blue  bar  in  the  Degree  of 
Aggregation  row.  The  height  of  eaeh  bar  is  proportional  to  the 
number  of  the  aggregated  elements,  and  users  ean  show  the  bars 
in  log  seale.  The  intensity  of  a  eell  in  the  permutation  matrix  is 
also  proportional  to  the  number  of  aggregated  elements. 

3.2  Avoiding  Inconsistencies 

Venn  diagrams  are  widely  used  to  represent  set  relationships. 
While  they  are  intuitive  and  familiar  to  users,  Venn  diagrams  have 
the  drawbaek  of  ineonsisteneies:  missing  valid  interseetion  areas 
and  showing  invalid  interseetion  areas.  First,  let’s  assume 
relationships  among  three  sets  A,  B  and  C,  where  A  and  B  have 
some  eommon  elements  and  C  has  elements  in  (^4  -  B)  and 
{B  —  A)  but  not  in  {^Ar\B)  .  If  we  represent  this  relationship  in  a 
Venn  diagram,  an  empty  set  (^4  n  5  n  C  =  is  shown  as  a 
region  in  gray  (Figure  4a).  If  we  loosen  the  eonstraint  that  eaeh 
set  should  be  a  eirele,  this  relationship  ean  be  represented  in  a 
Venn  diagram  without  sueh  ineonsisteney  (Figure  4b).  Then, 
however,  the  diagram  loses  the  advantage  that  users  are  used  to  it. 
The  other  ineonsisteney  is  ineurred  by  the  faet  that  it  is  very  hard 
to  aehieve  a  valid  Venn  diagram  when  the  number  of  sets  is  large. 
Furthermore,  it  is  almost  impossible  to  aeeurately  size-eode  all 
possible  zones.  Thus,  it  is  eommon  that  some  valid  interseetion 
areas  are  missing  in  Venn  diagrams  espeeially  when  many  sets 
have  interseetions  with  many  others. 


(a)  Venn  Diagram  with  (b)  Euler  Diagram  without 

ineonsisteney  ineonsisteney 

Figure  4.  Ineonsisteney  of  Venn  Diagrams,  (a)  and  (b)  show  the 
same  sets  relationships.  There  is  no  element  in  the  gray  area  at  (a), 
but  there  is  no  way  to  avoid  this  ineonsisteney  in  Venn  Diagrams. 
By  allowing  C  to  have  a  eoneave  eontour,  it  is  possible  to  avoid 
the  ineonsisteney  in  Euler  Diagrams  (b). 

To  maintain  users’  familiarity  with  Venn  diagrams  while 
avoiding  the  two  ineonsisteneies,  we  suggest  applying  the 
information  visualization  mantra  (overview  first,  zoom-and-filter, 
detail  on  demand)  [15].  We  use  a  permutation  matrix  view  to 
show  an  overview.  Dynamie  queries,  manual  seleetions,  and 
ranking  of  sets  allow  users  to  narrow  down  to  two  or  three  sets  to 
have  an  easy-to-understand  diagram.  However,  even  with  three 
sets,  Venn  diagrams  still  suffer  from  the  two  ineonsisteneies 
explained  above.  Thus,  we  propose  a  new  diagram  named  Fairy 
diagram  shown  in  Figure  5.  A  Fairy  diagram  does  not  eontain  any 
invalid  interseetion  areas  and  all  areas  are  aeeurately  size-eoded 


by  the  number  of  elements  in  regions.  It  looks  like  a  roulette 
wheel,  where  eaeh  set  is  represented  as  a  fan. 


(a)  Two  Sets 


Figure  5.  Fairy  Diagram 


For  two  sets  A  and  B,  a  eirele  represents  the  union  (^Avj  B)  ■ 
The  eenter  angle  of  the  fan  for  A  is  ealeulated  as  follows. 

n{A) 


6^  -lny<- 


n(A  u  B) 

where  is  the  eardinality  of  set  A.  The  eenter  angle  of  the  fan 
for  B  is  ealeulated  in  the  same  way.  If  the  interseetion  (A  n  B)  is 


not  empty,  the  two  fans  for  A  and  B  overlap.  The  eenter  angle  for 
the  overlapping  fan  is  ealeulated  as  follows. 

n{Ar\B) 


e, 


{Ar\B) 


=  lny^- 


n{A\jB) 

Therefore,  all  regions  split  by  the  fans  of  the  sets  A  and  B  are 
aeeurately  size-eoded. 

For  three  sets  A,  B  and  C,  a  eirele  represents  the  union 
{AyjByjC)  ■  The  interseetion  (^ n 5 n C) is  represented  as  a 
eenter  eirele.  If  the  outer  eirele  has  the  radius  of  R,  the  radius  of 
the  eenter  eirele  (r)  is  ealeulated  as  follows. 

n{A  r\B  r\C)  _  m 


n{AKjBKjC)  ttR^ 


r  = 


n{AnB  nC) 
n{AKjB^C) 


xR 


Thus,  the  area  of  the  eenter  eirele  for  the  set  (Ar^B  r^C)  is 
exaetly  proportional  to  the  eardinality  of  (AnBnC)  ■  A 
doughnut- shaped  region  between  the  eenter  and  outer  eireles 
represents  the  set  ((AuB'<jC)-(AnB  nC))  •  In  the 
doughnut- shaped  region,  there  are  three  doughnut  segments  for 
the  three  sets  A -(AnBnC)  ,  B -(AnBnC)  ,  and 
C  —  (AnB  nC)  •  Eaeh  doughnut  segment  has  a  eenter  angle  in 
proportion  to  the  eardinality  of  the  eorre spending  set.  The  eenter 
angle  of  the  doughnut  segment  for  the  set  (A-(AnB  n  C))  is 
ealeulated  as  follows. 


n(A)  -  n(A  n  B  n  C) 

X  niAyjByjC)-n{A(^BnC) 

Thus,  we  ean  aeeurately  size-eode  all  regions  split  by  the  eenter 
and  outer  eireles  and  three  doughnut  segments. 

While  Fairy  diagrams  have  advantages  sueh  as  no 
ineonsisteneies  and  aeeurate  size-eoding  as  shown  above,  there 
are  some  problems  with  this  approaeh.  For  example,  eireles  and 
doughnut- shape  regions  are,  in  theory,  drawn  within  a  eirele  and  a 
part  of  some  outer  ares  ean  overlap  eaeh  other.  Thus,  sometimes 
it  is  diffieult  to  know  the  exaet  bounds  of  a  region.  This  problem 


can  be  attenuated  by  drawing  region  boundaries  with  a  tiny 
displacement  as  shown  in  Figure  5. 


3.3  Ordering  Sets  and  Elements 

The  ordering  of  columns  and  rows  significantly  influences  the 
pattern  of  a  permutation  matrix.  Generally,  the  goal  of  reordering 
in  a  permutation  matrix  is  to  move  significant  cells  to  the  diagonal 
of  the  matrix  [7].  Since  this  is  not  eligible  in  our  permutation 
matrix  for  set  concordance  visualization,  we  propose  three 
different  ordering  methods.  First,  we  suggest  Minimum-Cost 
Spanning  Tree  (MST)  ordering,  where  elements  of  similar 
membership  are  placed  close  to  each  other.  To  apply  Prim’s 
algorithm  [2],  one  of  the  MST  construction  algorithms,  each 
element  is  represented  as  a  vertex  and  the  relationship  between 
every  pair  of  elements  is  represented  as  an  edge  whose  cost  is 
inversely  proportional  to  how  similar  their  memberships  are.  The 
cost  between  two  elements  is  calculated  as  follows. 


Co5?(e„,eJ  =  l- 


(#  of  sets  with  both  and  ) 


(#  of  all  sets) 

As  the  number  of  sets  that  have  both  and  increases,  the 
cost  becomes  smaller  (i.e.,  two  elements  are  more  similar).  After 
calculating  costs  of  every  pair  of  elements.  Prim’s  algorithm  is 
used  to  build  a  minimum  cost  spanning  tree.  The  vertex 
corresponding  to  an  element  that  belongs  to  the  most  sets  is 
served  as  the  start  vertex.  Next  vertex  to  be  added  is  the  vertex 
that  is  not  in  the  current  spanning  tree  and  is  closest  to  some 
vertex  in  the  tree.  According  to  the  sequence  that  vertices  are 
added  in  the  algorithm,  the  corresponding  elements  are  ordered. 

Sets  are  ordered  in  the  same  way  as  elements  except  for  the  cost 
function,  which  is  defined  as  follows. 


Cost{S.  ,Sj)  =  l 


n(S,nSj) 

n(S,uSj) 


Since  MST  ordering  of  sets  and  elements  significantly  improve 
the  permutation  matrix,  concordance  among  sets  and  even  among 
elements  can  be  examined  more  efficiently. 

While  MST  ordering  helps  users  identify  similar  elements  and 
sets,  more  ordering  methods  are  useful  to  support  other  tasks  such 
as  finding  a  specific  element  or  the  biggest  set.  For  example,  it  is 
easier  to  find  an  element  when  elements  are  in  alphabetical  order. 
We  provide  three  additional  ordering  methods  for  elements: 
move  a  column  to  the  right  end,  order  by  name,  and  order  by  the 
number  of  memberships;  and  two  more  for  sets:  order  by  name 
and  cardinality. 


for  element  re-ordering  shows  up.  Selecting  a  menu  item,  users 
can  move  elements  to  the  right  end  of  the  column.  This  enables 
users  to  easily  compare  several  elements  of  interest  by  putting 
them  side  by  side  and  right  next  to  the  set  names.  Similar  to  sets, 
elements  can  also  be  sorted  by  three  criteria;  alphabetically,  by  the 
number  of  memberships,  and  by  MST  ordering. 

When  users  move  the  mouse  over  a  column  header  of  an 
element,  ConSet  highlights  the  corresponding  column  with  a 
greenish-gray  rectangle.  In  addition,  the  names  of  sets  that  do  not 
contain  that  element  are  grayed  out.  This  helps  users  identify  all 
the  sets  that  an  element  belongs  to.  The  name  of  the  element  is 
also  shown  in  the  elements  list  in  the  Diagram  Ordering  view 
along  with  their  membership  information.  If  the  column  is 
aggregated,  the  names  of  all  the  aggregated  elements  are  shown. 

Similarly,  if  users  move  the  mouse  over  a  set  name,  the 
corresponding  row  is  highlighted  with  a  rectangle  in  the  set’s  own 
color.  The  names  of  elements  that  do  not  belong  to  the  selected 
set  are  grayed  out.  The  names  of  all  the  elements  of  the 
highlighted  set  come  in  the  elements  list.  If  users  move  the 
mouse  over  a  gray-filled  cell  C(z,  j)  in  the  Permutation  Matrix 
view,  the  cell  is  highlighted  by  a  red  rectangle  with  the  y-th 
element’s  name  highlighted  in  red  and  the  z-th  set’s  name 
underlined  in  red.  The  name  of  the  y-th  element  and  the  names  of 
its  aggregated,  if  any,  elements  are  shown  in  the  elements  list. 

4. 1 .2  Dynamic  Filtering  of  Sets  and  Elements 

ConSet,  by  default,  shows  the  names  of  all  the  sets  in  the  sets 
list  in  the  Dynamic  Control  view.  It  allows  users  to  change  the 
visibility  of  sets  in  the  Permutation  Matrix  view.  For  example,  if 
users  check  (or  uncheck)  a  check  box  right  before  a  set  name  in 
the  sets  list,  ConSet  shows  (or  hides)  the  set  in  the  Permutation 
Matrix  view.  This  enables  users  to  identify  similar  ones  among 
the  sets  of  their  interest.  For  example,  the  number  of  sets  was 
reduced  from  21  (Figure  6a)  to  10  (Figure  6b)  when  we  hid  the 
sets  whose  cardinality  is  less  than  30.  The  aggregation  of 
elements  is  based  on  their  memberships  to  the  visible  sets,  not  to 
all  the  sets.  So,  whenever  the  visibility  of  sets  changes,  ConSet 
re-computes  aggregation. 

ConSet  also  enables  users  to  filter  elements  to  be  shown  in  the 
Permutation  Matrix  view.  For  example,  the  “Filter  elements  to 
show”  slider  control  with  a  value  t  filters  to  show  only  elements 
that  belong  to  at  least  t  sets.  Filtered  elements  or  sets  can  either 
be  removed  from  or  be  grayed  out  in  the  Permutation  Matrix  view. 
The  number  of  elements  was  further  reduced  from  133  (Figure  6b) 
to  24  (Figure  6c)  when  we  filtered  out  the  elements  that  do  not 
belong  to  at  least  5  sets. 


4  ConSet  Interface 

We  developed  a  visualization  tool  named  ConSet  by  applying 
design  ideas  described  in  the  previous  section.  ConSet  enables 
users  to  examine  the  concordance  of  sets  visually  and 
interactively.  ConSet  consists  of  three  views;  Permutation  Matrix, 
Dynamic  Control,  and  Diagram  Ordering  views  (Figure  1).  The 
Permutation  Matrix  view  shows  an  overview  of  all  the  visible  sets. 
The  Dynamic  Control  view  on  the  right  contains  the  sets  list,  the 
diagram  area  and  the  filter  controls.  The  Diagram  Ordering  view 
at  the  bottom  has  the  ranked  diagrams  area  and  the  elements  list. 

4.1.1  Easy  Access  of  Sets  and  Elements 

ConSet,  by  default,  rearranges  the  sets  by  the  MST  ordering. 
Since  this  places  sets  with  more  common  elements  closer  to  each 
other,  users  can  easily  find  similar  sets.  In  addition,  the  sets  can 
also  be  ordered  by  their  name  and  cardinality,  which  is  available 
on  the  sets  list  in  the  Dynamic  Control  view. 

ConSet  also  provides  four  element  re-ordering  methods.  When 
users  right-click  the  mouse  on  a  column  header,  a  pop-up  menu 


4. 1 .3  Showing  Relationships  between  Sets 

ConSet  visualizes  the  relationship  of  two  or  three  sets  in  the 
diagram  area  in  the  Dynamic  Control  view.  Users  can  add  up  to 
three  sets  into  the  diagram  area  from  the  sets  list.  When  users 
select  a  set  in  the  sets  list,  the  corresponding  set  is  highlighted  in 
the  Permutation  Matrix  view  while  the  names  of  all  the  elements 
of  the  selected  set  are  shown  in  the  elements  list  in  the  Diagram 
Ordering  view.  When  they  click  the  “Add”  button  at  the  bottom 
of  the  sets  list,  selected  sets  are  added  to  the  diagram  area.  The 
names  of  added  sets  are  displayed  in  the  upper  window  of  the 
diagram  area  and  a  diagram  of  their  relationship  is  drawn  in  the 
lower  window  of  the  diagram  area.  Users  can  remove  sets  from 
the  diagram  area  by  clicking  the  “Delete”  button  after  selecting 
them  from  the  upper  window.  They  can  also  clear  the  diagram 
area  by  clicking  the  “Clear”  button. 

When  users  move  the  mouse  over  a  set  in  a  Venn  diagram  or  a 
Fairy  diagram,  a  tooltip  appears  to  show  its  name  and  cardinality. 
At  the  same  time,  the  set  is  highlighted  in  the  Permutation  Matrix 
view  and  the  elements  information  in  the  set  is  shown  in  the 


elements  list.  When  users  move  the  mouse  over  a  region  for  an 
interseetion,  the  elements  in  the  interseetion  are  highlighted  in  the 
Permutation  Matrix  view  and  their  information  appears  in  the 
elements  list.  If  users  eliek  on  a  region  in  a  diagram,  the 
eorresponding  region  is  seleeted  and  the  seleetion  is  toggled  on 
another  eliek.  This  enables  users  to  examine  all  elements  in  the 
elements  list  when  serolling  is  required. 
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(a)  Original  data 
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(b)  Filtering  out  sets  whose  eardinality  is  less  than  30  from  (a) 


:  ■  H   ■  

■ 

■ 

1  ■ 

■■p 

■ 

■ 

■ 

■ 

■■ 

■ 

■  ■ 

■■ 

■ 

■ 

■P 

■■ 

■ 

■■ 

■■ 

■ 

■■ 

■■ 

■In 

■ 

■m 

■ 

[10]  of  [21]  sets 
{24}  of  {163}  elements 


Mernbetsliip 

Degree  of  Ayyrey<itton  (loylO) 

[30]  purine  nircleotirte  birirtitiy 
[35]  biopolyrner  rnetobolism 

[37]  t  espouse  to  str  ess 

[38]  irntntttteresportse 

[39]  defense  response 

[40]  tesportse  to  biotic  stimitlits 

[56]  celhtl.u  protern  met.ilrolrsrn 

[57]  protein  rnet.ibolisrn 

[59]  cellular  macromolecrtle  metabolism 


(e)  Filtering  out  elements  that  belong  to  less  than  5  sets  from  (b) 


Figure  6.  Sets  and  Elements  Filtering  with  Human  Museular 
Dystrophy  Dataset  of  21  sets  and  163  elements. 


eolor  eoded  by  the  ratio  of  the  eardinalities  of  two  matehing  sets. 
This  eolor  eoding  is  intended  to  give  proper  penalty  to  the  eases 
where  one  big  eluster  from  one  elustering  result  overlaps  with 
several  small  elusters  from  the  other  elustering  result,  whieh  is  not 
so  interesting  eoneordanee. 

Figure  7  and  Figure  8  visualize  the  eoneordanee  between  the 
hierarehieal  elustering  result  and  K-means  elustering  result  with 
Euelidean  distanee  measure  with  77  breakfast  eereals  data  and 
with  Census  data  of  224  US  eastern  eounties  near  MD, 
respeetively.  Many  dense  red  eells  at  the  Cluster  Concordance 
row  in  Figure  7  indieate  that  those  two  results  are  very  eoneordant 
with  eaeh  other  despite  an  outlier,  “Multigrain  Cheerios,”  whieh 
does  not  belong  to  any  matehing  elusters  pair.  On  the  other  hand. 
Figure  8  shows  that,  overall,  the  two  elustering  results  for  the 
eensus  data  set  are  not  so  eoneordant  even  though  there  are 
several  strong  matehing  eounties  groups  with  dense  red  eells  on 
the  Cluster  Concordance  row. 
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4.1.4  Diagram  Ordering  using  the  Rank-by-Feature 
Framework 

We  applied  the  Rank-by-Feature  Framework  [14]  to  ConSet. 
The  Diagram  Ordering  view  shows  the  top  10  diagrams  ranked  by 
some  eriteria.  From  the  “Domain”  eombo-box  at  the  top  left 
eorner  of  the  view,  users  ean  seleet  the  ordering  of  diagrams 
between  two  or  among  three  sets.  Two  ranking  eriteria  are 
provided  in  the  “Ranking  eriteria”  eombo-box.  The  eriterion 
“interseetion  size”  ranks  diagrams  by  the  size  of  the  interseetion, 
and  the  eriterion  “overlap  metrie”  orders  diagrams  by  the  ratio  of 
the  interseetion  set  size  to  the  union  set  size.  This  helps  users 
easily  eapture  a  eolleetion  of  important  sets  that  meets  the  ranking 
eriteria.  Users  ean  see  eaeh  of  the  top  10  ranked  diagrams  in  two 
ways;  Venn  diagram  and  Fairy  diagram.  They  work  the  same  as 
in  the  diagram  area  of  the  Dynamie  Control  view. 

5  Other  Application  Examples 

We  extended  ConSet  to  help  users  eompare  elustering  results 
by  adding  speeial  funetionality.  An  output  of  a  elustering 
algorithm  is  in  most  eases  a  group  of  disjoint  elusters(=sets),  eaeh 
of  whieh  is  a  set  of  elements.  ConSet  arranges  sets  forming 
several  groups  where  a  set  from  one  elustering  result  is  put 
together  with  one  or  more  similar  sets  from  the  other  elustering 
result.  ConSet  arranges  these  groups  row  by  row  and  adds  a 
speeial  row  {Cluster  Concordance)  right  before  the  first  group, 
where  all  matehing  elements  within  a  group  are  projeeted  and 


Figure  7.  Clustering  Results  Comparison  (HCLUSTER: 
Hierarehieal  Clustering,  KCLUSTER:  K-means  Clustering)  with 
77  breakfast  eereals  data 
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Figure  8.  Clustering  Results  Comparison  (HCLUSTER: 
Hierarehieal  Clustering,  KCLUSTER:  K-means  Clustering)  with 
Census  data  of  224  US  eastern  eounties  near  MD 


The  same  approaeh  ean  also  help  users  identify  statistieal 
assoeiations  between  eategorieal  variables  or  between  a  elustering 
result  and  a  eategorieal  variable.  Users  ean  partition  a  dataset  into 
disjoint  sets  aeeording  to  a  eategorieal  variable.  For  example,  the 
eensus  data  for  all  US  eounties  ean  be  partitioned  into  disjoint  sets 
aeeording  to  eategorieal  variables,  sueh  as  “poverty  level”  and 
“edueation  level.”  Sinee  an  integer-  or  real-type  variable  ean  be 
eonverted  to  a  eategorieal  variable  by  a  simple  binning,  ConSet 
ean  visualize  statistieal  assoeiations  between  a  eategorieal 
variable  and  an  integer-  or  real-type  variable. 


6  ConSet  Evaluation 


We  eondueted  a  qualitative  study  to  understand  how  well 
ConSet  works  and  to  identify  any  usability  issues.  We  originally 
wanted  to  eompare  three  approaehes  -  Treemap  layout  [5]  (shown 
in  Figure  9),  permutation  matrix,  and  VennMaster.  However, 
from  our  own  experienee  with  ConSet,  we  suspeeted  that  Treemap 
layout  approaeh  would  work  best  only  for  identifying  the  biggest 
sets.  Sinee  this  task  ean  be  easily  eompleted  by  other  approaehes 
with  a  sorting  feature,  we  deeided  to  eompare  ConSet  only  with 
permutation  matrix  to  VennMaster.  We  measured  the  time  to 
eomplete  eaeh  task  using  a  stopwateh  and  eounted  the  number  of 
wrong  answers,  time-outs,  and  give -ups.  The  experimenter  also 
took  notes  on  usability  issues  partieipants  experieneed  during  the 
walk  through  of  the  system. 
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Figure  9.  ConSet  with  Strip  Treemap  Layout.  Eaeh  box 
represents  a  set,  whieh  has  a  unique  border  eolor.  Set  name  and 
its  eardinality  are  shown  at  the  top  left  eomer  of  eaeh  box.  The 
box  size  is  proportional  to  the  number  of  elements  in  the  set. 

6.1  Data  and  Participants 

We  used  two  similar  datasets  exported  from  GoMiner  for  this 
user  study.  One  dataset  had  1 6  sets  3 1  elements  and  the  other  had 
23  sets  and  28  elements.  ConSet  is  implemented  to  import  the 
pair  of  GoMiner’ s  eategory  summary  file  and  gene  summary  by 
eategory  file.  From  the  pair  of  files,  ConSet  builds  a  number  of 
sets  of  genes,  eaeh  of  whieh  is  a  gene  ontology  eategory. 

We  reeruited  8  biologists  (5  males  and  3  females)  ineluding  1 
mail  pilot  subjeet.  The  pilot  data  is  not  ineluded  in  the  reporting 
of  the  experimental  task  data  beeause  the  interfaees  and  tasks 
were  improved  after  the  pilot. 

6.2  Procedure  and  Tasks 

Each  participant  used  both  interfaces;  interface  order  was 
counterbalanced.  Participants  first  received  training  on  the  first 


interface  and  were  allowed  to  play  with  the  program  to  learn  the 
basic  features.  They  were  allowed  to  ask  questions  during  the 
training.  For  each  interface,  participants  spent  about  10  minutes 
on  average.  Next,  they  were  asked  to  conduct  9  tasks  as  quickly 
as  they  were  possible.  Each  task  had  a  3 -minute  time  limit  and 
participants  were  allowed  to  give  up  a  task  at  any  time.  After  a 
short  break,  the  same  procedure  was  repeated  with  the  second 
interface.  Preferences,  comments,  and  suggestions  were  collected 
during  debriefing.  Each  session  lasted  38  minutes  on  average. 
The  list  of  tasks  follows. 

1 .  What  are  the  top  three  biggest  sets? 

2.  What  is  the  size  of  the  biggest  set? 

3.  What  are  the  top  three  elements  that  belong  to  the  most  sets? 

4.  Name  the  sets  that  have  a  given  element. 

5.  Name  the  sets  that  have  two  given  elements. 

6.  What  are  three  sets  that  share  the  most  elements? 

7.  Name  the  elements  in  the  intersection  of  two  sets? 

8.  Name  the  elements  in  the  intersection  of  three  sets? 

9.  Name  the  elements  that  are  in  A  but  not  in  B. 

6.3  Results 

6.3.1  Task  times.  Error,  and  Preferences 

Out  of  63  questions  across  participants,  while  there  were  only  6 
time  outs  and  5  incorrect  answers  with  ConSet,  there  were  30  time 
outs  and  10  incorrect  answers  with  VennMaster.  For  task  6,  two 
participants  forgot  how  to  use  diagram  ordering  in  ConSet.  Two 
participants  were  not  able  to  complete  for  task  9  and  one  for  task 
1  and  5.  Task  completion  times  for  time  outs  were  not  included  in 
the  task  time  analysis. 

As  can  be  seen  from  Figure  10,  participants  completed  most 
tasks  faster  with  ConSet.  In  fact,  with  VennMaster  no  one  could 
complete  task  3,  4,  and  5  within  the  3  minute  time  limit.  However, 
7,  6,  and  5  participants  answered  correctly  with  ConSet  for  task  3, 
4,  and  5  respectively.  We  believe  this  is  because  ConSet  provides 
good  support  for  showing  the  names  of  elements. 


Figure  10.  Average  task  completion  times 

When  asked  which  interface  they  preferred  overall,  6  out  of  7 
participants  chose  ConSet  over  VennMaster.  The  reasons  from 
participants  include  “I  was  able  to  complete  all  tasks,”  “I  like 
interactive  highlighting,”  “more  user-friendly,”  and  so  on.  One 
participant  who  preferred  VennMaster  said  that  it  is  simple  and 
she  got  used  to  it.  She  also  said  that  she  might  change  her 
preference  if  she  gets  comfortable  with  the  Permutation  Matrix 
view  by  using  it  more.  And  one  other  participant  who  preferred 
ConSet  said  that  more  training  time  is  needed  to  get  used  to 
ConSet. 


6.3.2  Usability  Issues 

We  observed  several  usability  issues  in  ConSet  that  needed  to 
be  addressed.  There  was  elear  user  frustration  around  the  seleetion 
of  sets  in  the  Dynamie  Control  view  on  the  right.  Three 
partieipants  had  a  diffieulty  ehoosing  sets  to  show  in  the  diagram 
view.  Even  though  the  eheek  box  in  front  of  the  set  name  is  to 
filter  sets  to  show  in  the  main  Permutation  Matrix  view,  some  of 
the  partieipants  thought  that  the  eheeked  sets  would  be  added  into 
the  diagram  area. 

Another  issue  is  that  there  is  no  way  to  seleet  the  differenee 
area  (A  -  B).  This  is  beeause  single  eliek  behaves  differently 
depending  on  where  users  seleet;  eliek  on  the  interseetion  area 
seleets  the  interseetion  but  eliek  on  the  differenee  seleets  the 
entire  set.  To  address  this  issue,  we  ean  introduee  more  eonsistent 
interaetion  style  to  seleet  areas  in  the  Venn  and  Fairy  diagrams. 
First,  single  eliek  seleets  the  smallest  eontaining  area.  So,  if  users 
eliek  on  the  differenee  or  interseetion  area,  the  differenee  or 
interseetion  will  be  seleeted.  Seeond,  users  ean  eombine  two 
areas  by  elieking  an  area  with  the  eontrol  key.  Lastly,  double 
eliek  on  an  area  seleets  all  the  sets  that  eontain  the  area.  So,  users 
ean  seleet  an  entire  set  by  double  elieking  on  the  differenee  area. 

There  is  no  effieient  way  to  find  elements/sets  with  their  name. 
Even  though  ConSet  enables  users  to  sort  elements/sets  by  their 
name,  four  partieipants  did  not  use  the  sort  feature  and 
sequentially  sean  element  names  for  task  4.  This  would  be  a 
bigger  problem  when  the  number  of  elements  is  large.  We  ean 
address  this  issue  by  providing  a  simple  seareh  on  the  element  and 
set  name. 

The  familiarity  with  the  traditional  Venn  diagram  makes  it  hard 
for  users  to  utilize  a  new  Fairy  diagram.  In  addition,  the  task  used 
in  the  study  was  easy  enough  to  be  eompleted  with  the  Venn 
diagrams.  However,  we  believe  that  instantaneous  highlighting  of 
the  area  on  mouse-over  along  with  informative  tooltip  text  helped 
users  understand  how  to  interpret  the  diagram.  It  was 
eneouraging  to  observe  some  users  utilized  the  Fairy  diagram 
after  a  short  tutorial. 

7  Conclusion 

We  developed  a  general  set  visualization  tool  ealled  ConSet 
built  upon  the  permutation  matrix,  whieh  supports  important  tasks 
for  eoneordanee  analysis  of  sets  and  elements.  ConSet  shows  an 
overview  of  relationships  among  sets  and  helps  users  effieiently 
perform  fundamental  set  operations  sueh  as  interseetion  and 
differenee.  ConSet  provides  the  top  10  eolleetions  of  sets  that  are 
most  similar,  measured  either  by  the  number  of  eommon  items  or 
by  the  overlap  metrie.  ConSet  also  enables  users  to  aggregate  and 
filter  sets/elements,  whieh  improves  the  sealability. 

Our  Fairy  diagram  addresses  the  two  ineonsisteney  problems 
that  may  ineur  in  Venn  diagrams:  missing  valid  interseetion  areas 
and  showing  invalid  interseetion  areas.  It  also  provides  exaet  size 
eoding  of  all  areas.  And  interseetion  of  three  sets  is  elearly 
visualized  as  a  eenter  eirele.  Permutation  matrix  display  makes  it 
possible  to  avoid  the  problem  that  too  many  sets  overlap  in  the 
general  Venn  diagrams.  Another  strength  of  the  permutation 
matrix  is  that  it  provides  better  support  for  showing  the  names  of 
elements.  ConSet  performed  mueh  better  when  tasks  required 
users  to  aeeess  information  through  elements. 

We  eondueted  a  qualitative  user  study  to  evaluate  how  our  tool 
works  in  eomparison  with  a  traditional  set  visualization  tool  based 
on  a  Venn  diagram.  We  found  that  users  performed  better  with 
ConSet  than  with  the  traditional  interfaee  for  many  tasks  and  most 
users  preferred  ConSet. 
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